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Description 

[0001] The present invention relates to a multi-proc- 
essor system with shared memory. 
[0002] It is known that to obtain high performance in 
data processor systems use is made ot multi-processor 
architectures in which a plurality of processors simulta- 
neously perform a plurality o1 processes by dividing the 
tasks. 

[0003] Cooperation between processors requires that 
the processors exchange information and messages 
and that they are able to operate on the same data. 
[0004] The processors must therefore be connected 
together and to at least one working memory by means 
of suitable communication channels. 
[0005] It is also known that the technology offers 
working memories having large capacity and low cost, 
but which have read/write times very much greater than 
the operating times of the processors. 
[0006] In order fully to exploit the power offered by the 
processors use is therefore made of fast local memories 
or caches ol limited capacity, each associated with a 
processor, and a plurality of individually and independ- 
ently addressable working memories. 
[0007] The addressable memory space is thus distrib- 
uted between several units or banks of memory with "in- 
terleaving" criteria which minimise the probability of con- 
flict in accessing the plurality of memories by several 
processors. 

[0008] The adoption of fast local memories in which 
data stored in the working memory can be replicated, 
for more rapid access, gives rise to problems of coher- 
ence or, using Anglo Saxon terminology, problems of 
data "consistency". 

[0009] The adoption of several memory modules 
gives rise to interconnection problems between the var- 
ious processors and the various modules. 
[001 0] The state of the art offers two architectural ap- 
proaches which at least partially resolve these prob- 
lems. 

1) "bus" architecture or multipoint communication 
channels. 

[0011] All the processors and all the memories of the 
system are connected to a single system bus, which 
constitutes a time-shared resource to which the proces- 
sors and possibly the memories have access in compe- 
tition with one another for limited and non-overlapping 
time intervals. 

[0012] Access to the system bus is assigned, on re- 
quest of the various units, by arbitration logic of unitary 
or distributed type, which resolves the access conflicts 
according to pre-established criteria. 
[001 3] This type of architecture offers essentially two 
advantages: 

the operations for interconnecting two units (proc- 
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essor and memory) via the system bus are all seri- 
alised and ordered with respect to one another and 
allow an easy management of the communication 
process; 

5 - all the processors connected to the system bus can 
see all the transactions taking place on the system 
bus; it is therefore possible to assure in real time 
the consistency of the data with relatively simple 
"snooping" or observation mechanisms. 

10 

[0014] On the other hand, the disadvantages or limi- 
tations of the architecture are considerable: 

each wire of the system bus is connected to a large 
is number of input and output loads; driver circuits 
having a power suitable to the load and which are 
therefore relatively slow, are therefore necessary 
for signals on the various wires. 

20 [001 5] The essentially capacitive nature of the loads 
limits the signal frequency which can be transferred and 
therefore the speed of transfer of the information or 
transfer rate" of the system bus. 
[0016] Sharing the same resource for read/write op- 

2S erations between several units increases the access 
conflicts and consequently the problems of response, 
that is to say waiting for access to the bus and for pos- 
sible subsequent reception of the requested informa- 
tion. The response time is not determined only by the 

30 slowness of a memory unit in responding, but also to 
possible bus access conflicts, the probability of which is 
higher the higher the number of units competing for bus 
access, and the longer the time required to transfer sig- 
nificant information along the bus and to carry it away, 

35 thereby freeing the bus. 

2) The "crossbar switch" architecture or connection 
cross bar architecture. 

40 [001 7] The processors and the memory are intercon- 
nected in pairs by means of a plurality of communication 
channels which intersect and are selectively intercon- 
nected by the selective closure of switches. 
[001 8] The advantages of this type of architecture are: 

45 

more pairs of units are able to intercommunicate si- 
multaneously on separate channels 
the matrix interconnection makes it possible to re- 
duce the RC loads of the various communication 
50 lines. 

[001 9] It is therefore possible to use control circuits of 
bwer power, and to operate at a higher frequency. 
[0020] The transfer rate which can be achieved is very 
55 high not only because it allows a greater frequency of 
the transferred signals but also due to the many simul- 
taneous parallel transfers. Moreover, the paired inter- 
connection of units, generally maintained for several 
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successive transactions, allows channelling or "pipelin- 
ing" of the transactions and a further increase of the 
transfer rate which can be achieved without creating re- 
sponse time problems for the other processes for the 
majority of the time the resource is occupied. 
[0021] On the other hand, even in this case there are 
significant disadvantages: 

the simultaneous transfer on many pairs of inter- 
connections impedes "snooping" between proces- 
sors, and the coherence of the data may deteriorate 
in an environment in which it is replicated in several 
memories or stores. 

[0022] To ensure coherence it is necessary to re- 
nounce simultaneous transfers (at least of addresses). 

- the problems of "routing" of the signals and of ter- 
mination of the components and management of 
the interconnections, becomes particularly oner- 
ous. 

[0023] This disadvantage is obviated by the multi- 
processor system forming the subject of the present in- 
vention, in which a plurality of processors each commu- 
nicate with a shared memory, constituted by a plurality 
of separately addressable modules by means of a sys- 
tem bus (or multi-point connection) for transfer of ad- 
dresses and commands and by means of point-to-point 
data channels which connect each processor individu- 
ally to a data cross bar interconnection logic and via this 
to memory modules. 

[0024] There is thus realised a hybrid architecture 
which associates the advantages of the bus system ar- 
chitecture and the cross bar architecture and which: 

Allows ordered pipelining of several transfers be-, 
tween the same processor and the memory. 
Reduces the loads of the point-to-point data trans- 
fer channels between individual processors and 
memory, making it possible to operate at high trans- 
fer frequencies. 

- Allows parallel transfers to take place which involve 
different resources. 

Allows ordered serialisation of access to the mem- 
ory. 

Allows all the processes to perform "snooping" of 
the address channels and the data consistency in 
real time in the case of architectures with data rep- 
licated in local memories or caches. 

[0025] According to a further aspect of the multiproc- 
essor system according to the present invention the 
memory modules are separately controlled to read or 
write in such a way as to operate with partial overlap of 
operating time and, therefore, as autonomous memory 
units, but are addressed via a common control unit con- 
nected to the system bus or address bus. 



[0026] The memory control unit also fulfils the role of 
arbitration unit for access to the system bus. 
[0027] In this way the bad of the address bus is re- 
duced to single processors and to the memory control 
s unit. 

[0028] According to another aspect of the multiproc- 
essor system according to the present invention the da- 
ta cross bar logic is provided with input/output registers 
both to the shared memory and to the processors 
10 [0029] With this transfer structure in cascade from 
register to register, several transfers can be made in par- 
allel and "pipelining" from and to memory is possible 
with partial overlap of transfer time even if the data 
crossbar functions as collector for all the data exchang- 
es es with the memory through a single data channel. This 
channel constitutes a node which does not limit the rate 
of transfer of data because the time necessary for trans- 
fer of data through the node can be contained within lim- 
its to be very short. 
20 [0030] According to a further aspect of the multiproc- 
essor system according to the present invention the in- 
terconnection logic is provided, as well as with buffer 
registers (or buffers), with channels having a different 
parallelism for connection to memory and to different 
2$ processors, that between memory and data crossbar 
being of NxM bytes : whilst that between data crossbar 
and processors being only of N bytes. 
[0031] Whilst the transfer of information between 
memory and data crossbar takes place simultaneously 
30 for blocks of NxM bytes the transfer between data cross- 
bar and processors takes place by serialising the oper- 
ation in M successive phases during the course of each 
of which a block of N bytes is transferred. 
[0032] The serialisation does not create response 
35 problems because the connections between data cross- 
bar and processors are dedicated and there is no mutual 
interference. 

[0033] The higher parallelism of the memory with re- 
spect to that of the processors on the other hand makes 

40 it possible to adapt the memory capacity or throughput 
to the possible processor requests which operate at a 
higher speed, and at the same time to contain within ac- 
ceptable limits the number of terminations of the various 
electronic components or units and the passive elec- 

45 tronic connections between the various units. 

[0034] The limitation on the number of terminations 
can be imposed not only for reasons of economic con- 
venience or of technical practicability of electronic com- 
ponents with a large number of input/output terminals, 

so but also by the requirements of convenience of use of 
commercially available components with interfaces di- 
rected to the use of standard communication buses. 
[0035] At the interface level, in fact, the hybrid inter- 
connection architecture on which the multi -processor 

ss system the subject of the present invention is based, is 
seen as a conventional standard bus, for example of the 
"VME or FUTURE BUS" type. 
[0036] The characteristics and advantages of the mul- 
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tiprocessor system according to the present invention 
will become clearer from the following description of a 
preferred embodiment of the invention and from the at- 
tached drawings, in which: 

Figure 1 is a block schematic representation of a 
multi-processor system with shared memory having 
architecture structured in accordance with the 
present inventbn; 

Figure 2 is a block schematic diagram of an embod- 
iment of a data cross bar for the architecture of Fig- 
ure 1 ; 

Figure 3 is a block schematic diagram of an embod- 
iment of a control unit for the system of Figure 1 ; 
Figure 4 is a timing diagram showing the operation 
of the multi-processor system of Figure 1 ; 
Figure 5 is a block schematic diagram of a preferred 
embodiment of a data cross bar for the system of 
Figure 1 ; 

Figure 6 is a timing diagram showing the operation 
of the multi-processor system with a data cross bar 
of the type shown in Figure 5; and 
Figure 7 is a block schematic diagram of a variant 
embodiment of the multi-processor system of Fig- 
ure 1. 

[0037] Figure 1 shows in block schematic form a mul- 
ti-processor system which shared memory and architec- 
ture structured in accordance with the present invention. 
[0038] The system comprises a plurality of proces- 
sors 1 , 2; 3. 4 : each provided with a buffer memory or 
cache 6, 7, 8, 9, a system memory 5 constituted by a 
plurality of modules 10, 11, 12, 13, 113, 114 (preferably 
greater in number than the number of processors), atim- 
er unit TIM UNIT which generates a timing signal CK of 
a predetermined frequency, a system memory control 
unit 15 (SMC) for controlling the system memory and 
arbitration of the system bus, and a data channel control 
logic 16 or data cross bar (DCB). 
[0039] The processors 1 , 2, 3, 4 are connected to- 
gether and to the system memory control unit (SMC) 1 5 
by means of a bus (ACBUS) 17 for transferring address 
and commands. 

[0040] Via suitable wires of this bus, and with a con- 
ventional arbitration and communication protocol, the 
processors send to the SMC unit (15) requests ABREQ 
for access to the bus and receive individually a bus grant 
signal AB GRANT, following which it can effectively oc- 
cupy the bus 1 7 and transfer to the SMC unit 1 5 a mem- 
ory address and the signals which identify the requested 
operation such as reading, writing or another type 
(RWIM). 

[0041] The system bus AC BUS 17 constitutes a mul- 
ti-point communication channel except possibly, but not 
necessarily, for the transfer of bus access requests 
ABREQ, the corresponding bus grant response 
BGRANT and the various processor state signals which 
are preferably exchanged between each of the proces- 



sors and the unit 1 5 on point-to-point connections. 
[0042] The unit 15 transfers to the memory 5, via a 
channel MADDR 1 8, the read/write address accompa- 
nied by suitable timing commands (STARTA, STARTB, 

s STARTC, STARTD, STARTE, STARTF), which, in de- 
pendence on the address, select and activate one of the 
various memory modules 10, 11, 12, 13, 113, 114. 
[0043] In each module a register AR holds the read/ 
write address for the whole of the time necessary, even 

10 if the time for which the address lasts on the channel 
MADDR 18 is limited. 

[0044] The transfer of data, on the other hand, takes 
place via the point-to-point connections selectively 
formed by the unit DCB 16 on the basis of timing corn- 
's mands received from the unit 15, between each of the 
processors 1, 2, 3, 4 and a memory data input/output 
channel MDAT, 19 or between pairs of processors. 
[0045] In each memory module a register DW holds 
one unit of data to write : received from the channel 
20 MDAT 19 for the whole of the time necessary for the 
write operation. 

[0046] In Figure 1 the processors 1, 2, 3, 4 are con- 
nected to the unit DCB 16 via the data channels I/OD1 , 
I/OD2, I/OD3, I/OD4 respectively. 

25 [0047] The operation of the whole system is synchro- 
nous and the various units are all clocked by the periodic 
signal CK generated by the unit 14. 
[0048] Figu re 2 shows in block schematic form the ar- 
chitecture of the unit DCB 16, made in the form of an 

30 integrated circuit. 

[0049] If the parallelism of the data channel is so high 
as not to allow it to be made as asingle integrated circuit, 
the unit DCB 16 can be made as a plurality of identical 
integrated circuits according to the known concept of "bit 

35 slicing" or division of the logic by groups of bits. 

[0050] The unit DCB 16 essentially comprises four 
groups of receivers 21, 22, 23, 24 for the input of data 
for the channels I/OD1, I/OD2, I/OD3, I/OD4, four 
groups of control circuits or drivers 25, 26, 27, 28 for the 

40 introduction of data onto channels I/OD1 , 1/OD2, 1/OD3, 
I/OD4, a group of drivers 29 for the introduction of data 
onto the channel 19 s a group of receivers 35 for input 
into DCB of data coming from the channel 1 9, and five 
multiplexers 30, 31, 32, 33, 34. 

45 [0051] The inputs of the multiplexer 30 are connected 
to the outputs of the groups of receivers 21 , 22, 23, 24 
and the outputs are connected to the inputs ol the group 
of drivers 29 so as selectively to connect one of the 
channels l/OD (i) to the channel MDAT 1 9 when the driv- 

50 ers 29 are enabled. 

[0052] Each of the multiplexers 31, 32, 33, 34 is as- 
sociated with one of the channels l/OD (i) and has four 
sets of inputs, each respectively connected to the out- 
puts of the receivers 35, 21, 22, 23, 24, with the receiver 

ss of the associated channel l/OD (i) being excluded. 
[0053] The outputs of the multiplexers 31 . 32, 33, 34 
are connected respectively to the inputs of the drivers 
25, 26, 27, 28 so as selectively to connect the channel 
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MDAT 19 to one of the channels l/OD and/or, possibly 
at the same time, to connect two l/OD channels togeth- 
er. 

[0054] The operation of the multiplexers and the driv- 
ers is controlled by suitable commands SEL1... SELN s 
generated by a decoder 36 which receivers suitable 
commands from the memory control unit 1 5 via the lines 

20. 

[0055] The commands are clocked by the signal CK. 
[0056] It is immediately noted that, for example, it is 
possible simultaneously to connect, and without data 
collision, the channel I/OD1 as a source of data jointly 
to the channel MDAT 19 and to one of the other l/OD 
channels, or two l/OD channels together whilst a third I/ 
OD channel is connected to the channel MDAT 19. 
[0057] Figure 3 shows in block schematic form the 
structure of the system bus arbitration and memory con- 
trol unit 15, which can be made as an integrated circuit. 
[0058] The unit 15 comprises logic 70 for arbitrating 
access to the system bus ( ABUS ARB UNIT), finite state 
logic 72 (STATE MACHINE), a pair of registers 73, 74, 
a decoder 75 and OR logic circuits 76. 
[0059] The arbitration bgic 70, of conventional type, 
receives at its input, via point-to-point connections be- 
tween the various processors, bus access request sig- 
nals ABREQ (i) and in an entirely conventional manner, 
with timing controlled by the signal CK, grants access 
to the system bus by sending a response signal AB- 
GRANT (i) on one of a plurality of point-to-point connec- 
tions with the various processors on a one at a time ba- 
sis. 

[0060] The logic 70 is preferably an integral part of the 
control unit 15, but can also be replaced by arbitration 
logic distributed throughout the processors in a known 
way, in which case the arbitration signals can be ex- 
changed by means of multipoint connections. 
[0061] The unit 1 5 receives command signals via the 
system bus ACBUS which define the operation to be 
performed, in particular a signal R/W which indicates if 
the requested operation is read or write and a signal 
RWIM which indicates a read operation with the inten- 
tion of modifying the unit of data read. Other commands 
which can be present lie outside the scope of the inven- 
tion and are not necessary for its understanding; 
[0062] The commands are accompanied on the sys- 
tem bus by the memory address of where the operation 
is to be performed. 

[0063] It is to be noted that the commands and the 
address are put on the system bus only after a processor 
has obtained access to the bus and jointly identify the 
resource (for example a memory module) which may be 
already engaged in the execution of other operations. 
[0064] To avoid keeping the system bus occupied 
whilst waiting for the resource to become free : the unit 
1 5, after having analysed the contents of the commands 
and the address, responds in this case with a signal RE- 
TRY: the command is thus refused and the requesting 
processor is invited to re-present it. 
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[0065] In this way the commands are executed only 
when the necessary resources are available. This en- 
sures that the commands, if executed, are executed in 
a predetermined time which depends only on the exe- 
cution speed of the resources involved. Therefore, in the 
case of reading from memory, the order in which the da- 
ta are provided from memory is the same order as that 
in which the commands have been accepted. 
[0066] The commands and address received from the 
unit 1 5 are held in the register 73, clocked by the signal 
CK and decoded by the decoder 75 (with inputs con- 
nected to the outputs of the register 73). 
[0067] Essentially the decoder determines, on the ba- 
sis of the address and the commands, which memory 
module (A, B, C, D, E or F) to use and if the requested 
operation is writing (R) or not. It can also identify, in de- 
pendence on the address, a transfer of data not destined 
for memory but for one of the processors which is iden- 
tified with the signals I/O. 

[0068] The output signals from the decoder are 
passed to the finite state logic 72 clocked by the signal 
CK and which at each period of the signal CK progress- 
es as a function of the previously received signals. 
[0069] Since, as already mentioned, as a conse- 
quence of the RETRY mechanism each operation re- 
quested by the processors, if executed, is executed in 
a predetermined time, the state logic is able on the basis 
of signals received in that time, to keep track of the re- 
source state in the current clock period and in the sub- 
sequent periods. 

[0070] The unit 72 therefore provides at its output a 
signal EN which enables loading of the address and 
commands present in the register 73 into the output reg- 
ister 74 only if the necessary resources are available in 
the predetermined and necessary time interval. 
[0071] The register 74 is loaded, as well as with the 
address and commands, with the signals A, B, C, D, E, 
F, only one of which is asserted at a time, and which 
when sent to the memory 5 on the channel MADDR18 
selects and activates one of the memory modules 
(STARTA, STARTB, STARTC, STARTD, STARTE, 
STARTF) in a mutually exclusive manner. 
[0072] In dependence on the memory operation acti- 
vated, the unit 72 also transfers, via the channel 20, suit- 
able timed commands for control of the data cross bar 
16 (Figure 1). 

[0073] Finally, with the commands OENA,B,C,D,E,F 
the selected module is enabled to transfer, in the case 
of reading, the data read onto the channel MDAT19. 
[0074] This only occurs if, during reading, an "inter- 
vention" condition does not occur following "snooping" 
as will now be considered. 

[0075] In multi-processor systems with data replica- 
tion in caches the consistency of the data is ensured 
essentially by two approaches: 

1 ) immediate writing of each modified data unit in 
memory or "write through" 
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2) deferred writing of the modified data in memory 
only when the opportunity arises ('write back" or 
"copy back"). 

[0076] The first approach requires writing to memory s 
each time a unit of data is modified in a processor cache 
and involves a significant use of the bus and memory 
resources for which reason it has fallen out of favour. 
[0077] The second approach presupposes that all the 
processors observe the read requests sent to memory 
to check if the reading relates to a unit of data present 
intheircache inmodifiedform, an updated copy of which 
does not exist in memory. 

[0078] In this case the processor in the cache of which 
the modified data is present must signal this circum- 
stance to the other processors and send the data to the 
processor requesting the requested data, replacing it in 
this at the memory the output of which is blocked, by not 
asserting the signal OENA, B, C; D. 
[0079] Advantageously the unit 15 is arranged to op- 
erate with the second approach (but which is easily 
adaptable to the first approach), which simplifies the ex- 
change of "snooping" signals between processors. 
[0080] For this purpose the unit 1 5 receives from the 
various processors, via the point-to-point connections, 
SNOOP OUT (i) state signals sent from the various 
processors with suitable timing, to signal that a read re- 
quest presented on the system bus ACBUS relates to 
data absent from cache (SNOOP OUT = NUL), present 
and valid in cache and therefore shared at least with the 
memory (SNOOP OUT = SHARED), or present in cache 
and modified with respect to that contained in memory 
(SNOOP OUT = MODIFY). 

[0081] The SNOOP OUT (i) state signals can also in- 
dicate that it was not possible to perform the ■SNOOP- 
ING" operation, for example because the processor was 
engaged, or else, in the case of data transferred be- 
tween processors, because the transfer data cannot be 
received, and in both cases that the transaction cannot 
be completed and must be repeated (SNOOP OUT = 
RETRY). 

[0082] These signals are received by the finite state 
logic 72 which takes account of them in defining the sys- 
tem state and the operations to be controlled. 
[0083] As will be seen in more detail below, if the re- 
ceived signal is indicative of MODIFY, a processor must 
intervene to present the unit of data on its l/OD(i) chan- 
nel after having had confirmation from the unit 15 that 
the a modify n operation must be performed. 
[0084] The finite state logic 72 suitably controls, via 
the channel 20, the connections to be established be- 
tween the various points of the data cross bar, giving 
maximum priority to the intervention request and, in the 
case of conflict for the use of the resources with a trans- 
action already in progress, stopping this transaction and 
signalling, by assertion of the RETRY signal, that the 
operation must be repeated. 

[0085] The SNOOP OUT (i) signals received from the 



various processors are then combined in an OR logic 
76 which also receives the RETRY signal from the unit 
72, asserted as necessary, and which produces ARESP 
output signals, transferred on the multipoint connections 
of the system bus to the various processors and indic- 
ative of the possible states of the system corresponding 
to NULL, SHARED, MODIFY or RETRY. 
[0086] The timing diagram of Figure 4 depicts in a 
concise form the operation of the system of Figure 1 and 
the unit 16 of Figure 2. 

[0087] In particular the diagram CK depicts the state/ 
level of the clock signal CK with time. 
[0088] The diagram ABREQ (i) depicts the state of the 
access requests which the various processors can send 
to the unit 15. 

[0089] The diagram is cumulative in the sense that it 
represents the electrical level of several communication 
lines, one per processor. 

[0090] Similarly, the diagram ABGRANT (i) cumula- 
tively depicts how the state of the response signals sent 
out by the unit 15 to the various processors vary with 
time. 

[0091] The diagram ACBUS depicts variations in the 
state of the signals which define an address and asso- 
ciated command (read/write), transmitted from each of 
the processors in different time intervals, onto the sys- 
tem bus. 

[0092] The diagram SNOOP OUT(i) depicts the cu- 
mulative variation with time in the state of the signals 
sent from the various processors to the unit 1 5 as a re- 
sult of the continuous observation which they keep on 
the addresses present on the system bus ACBUS. 
[0093] The diagram ARESP depicts the variation in 
time of the state of the signals emitted from unit 15 onto 
two lines of the system bus in response to the reception 
of an address and the SNOOP OUT(i) signals. 
[0094] With these signals the unit 15 informs all the 
processors that the associated transaction cannot be 
performed because the resources necessary for its ex- 
ecution are not available in the required time intervals 
and therefore the transaction must be requested again 
(RETRY), or else that the transaction relates to data not 
shared by several processors (NULL) or shared 
(SHARED) or modified by a processor (MODIFY). 
[0095] The unit 15 grants access to the processors 
one at a time according to pre-established priority crite- 
ria (for example to the processor which last obtained ac- 
cess a longer time ago) and in dependence on the tem- 
poral availability of the resources necessary for the ex- 
ecution of the transaction. 

[0096] The diagram MADDR represents the state of 
the address channel 18 which connects the unit 15 to 
the memory 5. 

[0097] Finally, the diagram l/OD(i) represents the cu- 
mulative variation with time in the state of the various 
data channels and the unit DCB 16. 
[0098] As can be appreciated, the clock signal CK de- 
fines a plurality of successive time intervals or clock pe- 
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riods P1, P2 P13 in which the clock signal, initially 

at level 0 or asserted (the relation between logic level 
and electrical level is entirely irrelevant) changes to level 
1. 

[0099] The various signals are asserted and de-as- s 
serted tor times and durations not greater than the clock 
period and the transition of the clock signal from 0 to 1 
at the centre of each period defines the instant in which 
the state of the signals is stable and can be "strobed 0 or 
recognised. 

[0100] With these premises it is possible to examine 
how the various transactions possible between units of 
the system progress. 

[0101] These transactions are essentially of four 
types: 

1 ) access to memory 5 by a processor i to read an 
item of data; this transaction is activated by asser- 
tion of the signal ABREQ (i) by the processor fol- 
lowed by sending a read command together with 
the address. 

2) access to memory 5 by a processor i for writing 
an item of data: this transaction is activated by as- 
sertion of the signal ABREQ (i) followed by sending 
a write command together with the address. 

3) intervention by a processor (i) in the read trans- 
action activated by another processor Y to provide 
the processor Y with an item of data in substitution 
tor that read from memory. 

[01 02] This transaction is activated by signalling to the 
memory control unit 15 via the SNOOP OUT (i) lines that 
the item of data has been modified and is available in 
the processor i. 

4) I/O message or direct interprocessor communi- 
cation; with this transaction a processor I sends an 
item of data directly to a processor Y which for ex- 
ample fulfils a control function for peripheral appa- 
ratus. 

[0103] The transaction differs from a write operation 
only because the address identifies a space outside the 
memory and is processor (or I/O) specific. 
[01 04] We now examine in detail the diagrams of Fig- 
ure 4 which, by way of example, show that in the time 
period P1 one (or more) ABREQ (i) signals are asserted. 
[0105] The arbitration unit 1 5, having received the re- 
quests, grants access to the processor 1 by asserting 
the signal ABGRANT (1 ) in the time period P2. The ac- 
cess is granted on the basis of predetermined priority 
criteria, lor example to the processor which obtained ac- 
cess to the bus longest ago. 

[0106] Having received the signal ABGRANT (1 ) the 
processor 1 puts onto the ACBUS channel a memory 
address which identifies for example the module A (pe- 
riod P3). 

[0107] The unit 15 receives this address and checks 
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that the module A is free, that is to say, is not already 
engaged in a read/write operation, and activates it by 
putting the address received onto the channel MADDR 
18 (period P4) by generating suitable module activation 
and selection signals. 

[0108] Still by way of example, the module A ad- 
dressed and activated during period P4, outputs during 
period P7 on MDAT channel 1 9, the information read. 
[0109] Inother words, by way of example, a read cycle 
requires four clock periods for its execution. 
[0110] During period P7 the control unit 15 enables 
output from the module A and commands unit DCB 16 
via lines 20 in such a way as to connect the channel 1 9 
to the channel I/OD1 so as to transfer the data at the 
output from module A to the processor 1 . With this, the 
read operation requested by processor 1 is concluded. 
[0111] It is evident that during the periods from P4 to 
P7 other read or write operations of module A cannot 
be activated. Moreover, it is not possible to use the chan- 
nel MADDR 18 during period P4 for addressing other 
modules, and likewise it is not possible use the channel 
MDAT 1 9 and the unit DCB 1 6 during period P7 to trans- 
fer other data between memory and another of the l/OD 
channels. 

[011 2] These occupied resource conditions are taken 
account of by the finite state logic of the control unit 1 5. 
[0113] However, once the read operation has been 
started in module A the MADDR channel can be freed 
and therefore other operations relating to modules B or 
C, D, E or F can be activated before considering which 
is suitable to conclude the examination of the transac- 
tion started between processor 1 and module 7: 
[0114] The address present on the system bus AC- 
BUS during period P3 is received not only by the unit 15 
but also by the processors 2, 3, 4 which are set up to 
check whether or not information identified by the same 
address is present in their cache and in which form 
(shared, modified). 

[0115] If it is not present or it is only shared, the vari- 
ous processors send SNOOP OUT(i) signals with this 
signification (NULL/S) to the unit 15 during period P4. 
[0116] During period P5 the unit 1 5 confirms, by send- 
ing to all the processors the signal ARESP with the sig- 
nification NULL/S that no updating action to the state of 
the cache is requested at the various processors. 
[0117] We now suppose that during period P3 proc- 
essor 2 is granted access to the system bus for a read 
operation. 

[0118] During period P4 the processor 2 puts an ad- 
dress on the system bus for a read operation in module 
(A) (2 > A). This address is received by the unit 1 5 which 
has just activated a read cycle in module A. 
[0119] Therefore unit 15, having checked that the re- 
source constituted by module A is not available, does 
not transfer the address onto channel MADDR and hav- 
ing received confirmation from the processors 1 , 3, 4 
that the reading does not involve an item of data con- 
tained in their cache (SNOOP OUT(i) = NULL) signals 
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to all the processors (ARESP = RETRY) that the reading 
will not be executed and that the processor 2 must re- 
peat the read request (period P6). 
[0120] Therelore during period P7 the processor P2 
reasserts the signal ABREQ (2) and during period P8 s 
the unit 15 asserts the signal ABGRANT2 (assuming 
that requests having a higher priority are not simultane- 
ously made by other processors). 
[0121] Thus during period P9 the processor 2 can 
again put the address onto the bus ACBUS and request 
a read operation in module A. 

[0122] In this case since the necessary resources are 
free the operation takes place: the address is trans- 
ferred from the control unit 15 onto channel MADDR 18 
(period P10) and the item of data requested is received 
from the processor 2 via the channel MDAT 19, the unit 
DCB 16 and the channel I/O D2 during period P13. 
[0123] We now suppose that the processor 3, having 
obtained access to the bus, during period P5 puts on 
the bus ACBUS an address directed at module C for a 
read operation. 

[0124] Since module C is free the read operation can 
be started by unit 15 and takes place according to the 
temporal flow already seen and which it is not necessary 
to repeat. This is because, in the hypothesis, the data 
item read in module C is not present in any of the proc- 
essor caches. 

[0125] If, on the other hand, the data item is present 
in a cache and has been modified, the transaction 
progresses in a different manner: for example during pe- 
riod P6 the processor 4, which by hypothesis has ob- 
tained access to the bus ACBUS, puts on the bus an 
address for module B. 

[0126] The control unit 1 5 transfers the address onto 
channel MADDR (period P7) and activates module B but 
also receives, with the SNOOP OUT(i) signals, an indi- 
cation that the data item requested is present in the 
cache of another processor (for example the processor 
3 SNOOP OUT 3 = MODIFY). 

[0127] Therefore the unit 1 5 signals to all the proces- 
sors with ARESP = MODIFY (period P8) that the ad- 
dressed data item will not be provided from memory but 
from a processor. 

[0128] The processor 3 recognises that its request 
has been acknowledged. In the period P10 the unit 15 
controls the unit DCB 16 in such a way as to allow the 
transfer of the modified data item from the processor 3 
to the processor 4 via the channel I/O D3, the unit DCB 
16 and the channel I/O D4, whilst the data item read in 
module B is not transferred from the output of the mod- 
ule due to the effect of the omit assertion signal OEN B. 
[0129] Advantageously the output data from the proc- 
essor 3 is also transferred to module B for writing into 
the module in substitution for the pre-existing data item. 
[0130] The last type of transaction to be considered 
is that of writing. 

[01 31] For example the processor 1 asserts a request 
ABREQ (i) for access to the system bus in the period 



P8 and, there not being any other request for access 
having greater priority obtains access to the bus AC- 
BUS (period P9, ABGRANT1 signal asserted). 
[0132] During period P10 it therefore puts an address 
(1 > B) for module B and a write command on the chan- 
nel ACBUS. 

[0133] On the hypothesis that the unit 15 does not 
identify a resource conflict, during the subsequent peri- 
od P11 the address is transferred to channel MADDR 
18 and the data item to be written is transferred from 
channel I/O D1 to channel MDAT 19. 
[0134] If the resource is not available, or because the 
memory module is engaged, or because during period 
P11 the channel MDAT 1 9 will be engaged (following a 
MODIFY signal), transfer of the data item and address 
will be blocked and during the period P1 1 the unit 1 5 will 
assert a RETRY signal. 

[01 35] It is worth noting that a write request p resented 
jointly with a modify request (MODIFY) asserted by an- 
other processor, can be treated in two different ways. 
[0136] If when each modify is asserted it is arranged 
that the corresponding data item is updated in memory 
before writing, the write request collides with the modify 
request, which has greater priority, and is not allowed 
access to the system bus. 

[0137] If on the other hand the read operation which 
has just caused the intervention of another processor 
with a MODIFY signal, is a read operation with intent to 
modify RWIT (that is to say, it is already known that the 
read data item will be modified), updating of the data 
item in the memory is useless and it is possible to grant 
access to the system bus for a write operation even if a 
request for access to one of the data channels is simul- 
taneously present for a transfer operation of a data item 
from one processor to another. 
[0138] That is to say a temporal superimposition of 
two data transfers is achieved without conflict, which in 
a conventional system bus architecture would not be 
possible. 

[01 39] This superimposition is also possible between 
interprocessor data exchange operations and memory 
read operations by resolving possible access collisions 
with the RETRY signal. 

[0140] In this hypothesis, if for example during period 
Pi 0 the processor 3 asserts a data write access request 
(ABREQ3 asserted) and during period P11 the unit 15 
grants access to the system bus ( ABGRANT3 asserted) 
the processor 3 puts on the bus ACBUS an address 
which identifies the operation as I/O intended for proc- 
essor 1 (period P12). 

[0141] The unit 15, having verified that the I/O oper- 
ation is intended for processor 1 , and that there are no 
resource conflicts, commands unit DCB 16 in such a 
way as to transfer the data item from channel I/O D3 to 
channel I/O D1 at the same time that the unit DCB is 
commanded to transfer a data item read in memory from 
channel MDAT 19 to channel I/O D2. 
[0142] The preceding description refers to an archi- 
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tecture such as that in Figure 1 in which the data cross 
bar 16 does not have any holding element. 
[01 43] The transfer of a data item via the unit 1 6 there- 
fore takes place in a single time period. 
[0144] For this reason it is therefore arranged that the 
data to be output from any of the processors is trans- 
ferred in a time period subsequent to that in which the 
address is presented to the system bus: this gives the 
unit 15 time to check if the necessary resources for the 
requested operation are available. 
[0145] The transfer of data through the unit 16 in a 
single time period presupposes that the data propa- 
gates through the whole length of the channel l/OD(i), 
through the unit 16 and through the channel MDAT 19 
within this time period. 

[0146] The clock period is therefore given a lower limit 
by this condition. 

[0147] According to a further aspect of the present in- 
vention the cross bar interconnection logic is provided 
with input holding registers each disposed immediately 
downstream of the receivers 21 , 22, 23, 24 and 35 and 
output holding registers disposed immediately up- 
stream of the output drivers 25, 26, 27, 28 and 29. 
[0148] With the adoption of only input holding regis- 
ters the data path is divided into two branches each of 
which can be traversed in two successive clock periods, 
each of very much shorter duration even if overall of du- 
ration equivalent to that considered with reference to 
Figure 3. 

[0149] With the adoption of input and output holding 
registers the path of data is divided into three branches 
each traversed in one of three successive clock periods. 
[0150] In both cases a very high transfer frequency 
on the channel MDAT 1 9 is obtained and the partial su- 
perimposition of the data transfer phases from different 
channels is also obtained. 

[0151] This subdivision of the data path further allows 
the adoption of different transfer parallelism between 
memory and DCB and between DCB and processors 
which makes it possible to reduce significantly the 
number of terminals of individual processors. 
[0152] Figure 5 schematically represents in block di- 
agram form a preferred embodiment of the data cross 
bar which employs these and other innovative concepts. 
[0153] In Figure 5 the functional elements corre- 
sponding to the elements already shown in the diagram 
of Figure 2 have been indicated with the same reference 
numerals. 

[0154] The data channel MDAT 1 9 constituted by 64 
+ 8 lines allows transfer of data to and from memory with 
8 byte parallelism (a double word) accompanied by an 
8 byte error correction code. 

[0155] The data channel 19 is connected to the re- 
ceivers 35 and to the drivers 29. 
[0156] The output of the receivers 35 is connected to 
a holding register 37 for holding data received from 
memory. 

[0157] The output of the registers 37 are connected 
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to syndrome generation logic 38 (SYNDR GEN) and to 
an error correction network 39 (DATA CORRECTION) 
of conventional type. 

[0158] The syndrome generator 38 analyses informa- 

5 tion received, recognises possible correctable and non- 
correctable errors (in which latter case it provides an "er- 
ror not correctable" output signal) and controls the logic 
39 for correction of correctable errors. 
[01 59] It also associates with each byte of information 

10 a parity control byte which it transfers to the logic 39. 
[01 60] The logic 39 presents 8 byte information, each 
accompanied by a parity byte, on an output channel 40. 
[0161] The output channel 40 distributes information 
to four groups of logic circuits 41, 142, 143 5 144, each 

is dedicated to a processor channel. 

[0162] Since the groups 41 , 142, 143, 144 are identi- 
cal to one another only the group 41 dedicated to the 
channel I/O D1 has been shown in detail. 
[0163] Thegroup41 comprises a first 72 byte register 

20 42 the inputs of which are connected to the channel 40 
and the outputs of which are connected, in groups of 18 
elements, to form input groups of a multiplexer 31 hav- 
ing 11 groups of inputs, each of 18 elements. 
[0164] The outputs of the multiplexer 31 are connect- 

25 ed to the inputs of an 18 cell register 44 the outputs of 
which are connected to the inputs of the drivers 25 the 
outputs of which lead to the channel I/O D1 . 
[0165] Four groups of inputs 45 of the multiplexer 31 
are connected directly to the channel 40. 

30 [0166] The remaining three groups of inputs are con- 
nected respectively to three channels 46, 47, 48 each 
of 18 wires, on to which the logic groups 142, 143, 144 
respectively feed two bytes of information, each accom- 
panied by a parity byte. 

35 [0167] The multiplexer 31 , controlled by suitable se- 
lection signals generated by the decoder 20, allows the 
register 44 to be loaded in succession, and therefore to 
transfer in succession to the channel I/O D1, pairs of 
bytes extracted from the double word present on the 

40 channel 40 or held in the register 43. It also allows pairs 
of bytes (and associated parity bytes) coming individu- 
ally or in succession from the channels I/O D2, D3, D4 
via the logic groups 142, 143, 144 respectively to be 
transferred onto channel I/O D1 . 

45 [0168] The possibility offered by the multiplexer of di- 
rectly selecting a double byte from the channel 40 
makes it possible simultaneously to load the register 44 
with a double byte present on the channel 40 and to load 
the register 42 with the double word present on the 

so channel 40. 

[0169] In this way transfer to the processor of the dou- 
ble byte expressly addressed by a read operation can 
be made very fast. The remaining double bytes held in 
the register 42 can be made to follow this double byte 

ss in a suitable order. 

[0170] The reading flow will, however, be considered 
more below. 

[0171] For writing data in memory or for transfer of 
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data between processors, the data cross bar includes, 
in unit 41, a receiver group 21 with inlets connected to 
the channel I/O D1, and outputs connected to 18 cell 
register 49. 

[0172] The outputs of the register 49 are connected 
to a channel 50 which distributes the information held in 
the register 49 to the iogic groups 42, 43, 44 (in particular 
to multiplexers equivalent the multiplexer 31 ). 
[01 73] The outputs of the register 49 are also connect- 
ed to the inputs of a second register 51 the outputs of 
which are connected to the inputs of a third register 52 
and parity error check logic (PCHECK). 
[0174] The outputs of the register 52 are in turn con- 
nected to the inputs of a fourth register 54 the outputs 
of which are connected to the inputs of a fifth register 
55 in cascade. 

[0175] Whilst the registers 49 and 51 have 18 cells, 
the registers 52, 54, 55 have 16 cells, it being superflu- 
ous to hold the parity bits. 

[0176] The byte outputs of the registers 51 , 52, 54, 55 
are connected to a first group of inputs 57 of a multiplex- 
er 56 having four channels each of 64 bits. 
[0177] The other groups of inputs 58, 59, 60 are re- 
spectively connected to logic groups 142, 143, ^cor- 
responding to the group 41 and associated with data 
channels I/OD2, I/OD3, 1/OD4. 
[0178] The outputs of the multiplexer 56 are connect- 
ed to the inputs of an 8 bit code generation logic 61 for 
detection and correction of errors (ECC GEN) and to the 
inputs of a 72 bit register 62 which also receives on 8 
inputs the ECC code generated by the logic 61 . 
[0179] The outputs of the register 62 are connected 
to the inputs of the output drivers 29 leading to the mem- 
ory channel MDAT 19. 

[0180] Figure 6 is a timing diagram showing the be- 
haviour of the DCB logic of Figure 5. 
[0181] The various lines of the diagram identified by 
the same signal names have the significance already 
seen with reference to Figure 4. 
[0182] For simplicity the SNOOP OUT diagram is 
omitted whilst to the l/OD(i) diagram which represents 
the state of the connection channels to the processors 
is added the diagram DIREG which represents the state 
of the DCB input registers such as the register 49 and 
37 of Figure 5, the diagram MDAT which represents the 
state of the memory data channel MDAT 1 9, the diagram 
DOREG which represents the state of the input register 
37 to the DCB from memory, and the diagram DO(i) 
which represents the state of the DCB output registers 
44 to the data channels. 

[0183] With the division into branches of the data path 
it is possible to use very short clock periods CK, for ex- 
ample of 1 0 nsec and to occupy the bus ACBUS or the 
memory channel MDAT for only two periods (20 nsec at 
each transfer). 

[0184] At each transfer it is possible to transfer 8 bytes 

of data (two words) or multiples thereof. 

[0185] At the data channel level of the processor the 



transfer can take place with a succession of partial 
transfers of two bytes at a time each executed during a 
clock period by exploiting the fact that in the architecture 
described each channel has available dedicated and 
s 'buffered 0 resources. 

[0186] This carries the possibility of superimposing 
the times of data transfer between several different 
channel l/Os and memory. 

[0187] At the same time the memory data channel 
10 MDAT and the address bus of the system constitute two 
nodes which impose sequential and ordered flow of data 
and addresses and allow the management and control 
of the various operations without the necessity of asso- 
ciating correlation labels to the said data and addresses 
is which become entirely superfluous. 

[0188] Considering Figure 6, a generic processor 1 
asserts, in period P1, a request for access ABREQ1, 
receiving during period P3 grant of access to the system 
bus and to the data channel. 
20 [0189] During periods P5 and P6 the processor 1 
therefore puts an address and associated commands 
on bus ACBUS. 

[0190] The address is transferred from the memory 
control unit 1 5 onto the memory address channel MAD- 

25 DR during periods P8 and P9. 

[0191] In the meantime the processor 1 hasputadou- 
ble byte of data on the channel I/O D1 during period P5, 
which is held in register 49 (Figure 4) during the period 
P6 5 and transferred during the subsequent periods grad- 

30 ually from the register 49 to the register 51 , 52, 54, 55. 
[0192] During period P10 the first double byte re- 
ceived through the channel I/O D1 is held in the register 
55. 

[0193] During period P6 the processor 1 puts a sec- 
35 ond double byte of data on the channel I/O D1 , which 
having been transferred from register 49 and into the 
cascaded registers 51 , 52, 54 is held in register 54 start- 
ing from period P 10. 

[0194] In the same way the processor 1 puts a third 

40 and a fourth pair of bytes onto the channel I/O D1 during 
periods P7 and P8, which are held and available respec- 
tively in registers 51 and 52 starting from period P10. 
[0195] Thus the processor 1 effects transfer of 8 bytes 
of successive pairs in the course of the four periods P5, 

45 ps and starting from period P1 0 the 8 bytes are available 
in parallel at the output of the multiplexer 56. 
[0196] During periods P12 : P13, the multiplexer 56 is 
enabled and the information transferred to the register 
62 where it is held, maintaining the output data on the 

so channel MDAT 19. 

[0197] If another processor 2 asserts the signal 
ABREQ2 during period P3 and receives access 
ABGRANT2 during period P5 for a write operation in a 
different module from that already engaged by the proc- 

55 essor 1 and the necessary resources are free, the proc- 
essor 2 can start and complete the write operation by 
putting an address on the bus ACBUS during periods 
P7 : P8 and by putting in succession four pairs of bytes 
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onto the channel I/O D2 during periods PI.... P10. 
[0198] This information is copied and maintained in 
the register 62 during periods P14, P15. 
[0199] The transfer from the two processors 1 and 2 
to memory therefore takes place in partial temporal su- s 
perimposition. 

[0200] A read operation proceeds with a substantially 
similar flow. 

[0201] For example an access request presented dur- 
ing period P5 from the processor 1 obtains grant of ac- 
cess during period P7 such that the bus ACBUS can be 
occupied with an address during periods P9, P10. 
[0202] The address is transferred onto channel MAD- 
DR 18 during periods P12, P13. 
[0203] The data item read is available on channel 
MDAT 19 for example during periods P20, P21 and is 
held in register 37 (diagram DOREG) during periods 
P21.P22. 

[0204] In period P22 the multiplexer 31 and the reg- 
ister 42 can be controlled so as to transfer a pair of bytes 
to the register 44 and to load all the 8 bytes received 
from memory into the register 42. 
[0205] In period P22 the channel MDAT 19 and the 
register B7 are therefore free to transfer and hold other 
information destined for example for another processor. 
[0206] In the period P23 the double byte held in the 
register 44 can be transferred onto channel I/O D1 whilst 
the register 44 is loaded with a double byte selected by 
the multiplexer 31 from among those held on the register 
42^ 

[0207] During periods 24, 25, 26 the subsequent three 
pairs of bytes are transferred onto channel I/O D1 and 
the transfer is completed. 

[0208] It is evident that reading too can take place with 
the transfer operations partially superimposed with oth- 
er reading operations. 

[0209] For example during period P3 if an access re- 
quest by processor 2 were asserted, associated with a 
read operation rather than a write operation, during pe- 
riods P7, P8, still on the hypothesis that resources are 
available, the data item read would be present on the 
channel MDAT during periods P18, P19 and would be 
loaded into the register 37 (diagram DOREG) during pe- 
riods P 19, P20. 

[0210] The block transfer into the register 44 (diagram 
DO(l)) would take place during periods P20... P23 and 
transfer onto the channel I/O D2 during periods P21... 
P24 in partial temporal superimposition with the opera- 
tions in progress in the registers D01 and on channel I/ 
OD1. 

[021 1] We now suppose that during period P9 a proc- 
essor 3 requests access to the system bus for a write 
operation. 

[0212] This presupposes the availability of the chan- 
nel MDAT after 10 clock periods from the period when 
the request is asserted, that is to say during periods P20, 
P21 when, on the other hand the channel MDAT will be 
engaged to satisfy the read request represented during 



period P5. 

[0213] Therefore the memory control unit 15 once 
having granted access to the bus, and having recog- 
nised the operation as a write operation (periods P13, 
P14), blocks the transaction by preventing the transfer 
of the address onto the MADDR channel (period P16) 
and the transfer of the data onto the channel MDAT and 
forces the processor 3, with a RETRY signal, presented 
at a predetermined time, as an ARESP signal (periods 
P18, P19) to repeat the access request during period 
P21 or thereafter. 

[021 4] It can therefore be seen that whilst the execu- 
tion of a data transfer operation requires 9 clock periods 
to write in memory and 18 clock periods to read from 
memory, the interference time and possible collision 
time between two transfer operations is limited to only 
two clock periods. 

[021 5] Partially superimposed transfers are therefore 
possible, which take place with the use of different mem- 
ory resources (modules), different processor channels 
(I/O (i)) and dedicated buffering, serialisation and paral- 
lelisation resources dedicated to each processor chan- 
nel in the DCB logic. 

[0216] It is also an immediate conclusion, on the basis 
of the diagram of Figure 5 and on the diagrams of Figure 
6, that in the event of intervention of a processor to trans- 
fer a modified item of data to another processor the 
transfer operation can take place directly in serial form 
of pairs of bytes, from one register functionally equiva- 
lent to the register 49 of Figure 4, to a register function- 
ally equivalent to the register 44, via one of the channels 
50,46,47,48. 

[0217] This transfer, as already seen with reference 
to the timing diagrams of Figure 3, can be integrally tem- 
porally superimposed over one or more transfers be- 
tween processors and memory. 
[021 8] It is clear that the preceding description relates 
only to a preferred embodiment of the ihventbn and that 
many variations can be introduced. 
[0219] The number of processors and memory mod- 
ules (4 processors and 6 modules in the preferred em- 
bodiment) can be chosen at will as can the parallelism 
ratio between memory parallelism and processor paral- 
lelism. 

[0220] More data cross bar logic components can be 
utilised in parallel to achieve multiple parallelism and the 
DCB logics can be provided, in addition to parity check 
circuits, with error correction and code generation for 
detection and correction of errors as well as circuits for 
alignment of individual bytes of information, as well as 
circuits for combining (merging) information read from 
memory with information coming from processors for 
the partial modification of memory information. 
[0221] It is also possible to use separate signals for 
arbitration of access to the address and command bus- 
es (ABREQ(i)) and for access to the data channel 
(DBREQ(i)), characterising the transaction up to the 
presentation of the request for access as read/write or 
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other type and to condition the grant ol the bus on the 
availability of the necessary resources present or 
planned, thereby reducing to the minimum the cases of 
'RETRY" and therefore with an optimum exploitation of 
the system bus. 

[0222] To allow the same processor to receive data in 
rapid successbn following consecutive read requests, 
the register 42 can be constituted by a plurality of reg- 
isters in cascade or a "stack" of the "FIFO" type (first in- 
first out). 

[0223] The same concept can be used to avoid RE- 
TRY operations in the case of write operations with re- 
source conflict, by storing the write operation in an input 
buffer disposed downstream of the registers 51 , 52, 54, 
55 of Figure 5 and by arranging a similar buffer register 
for temporary deposit of the address in the control unit 
15. 

[0224] In this way a write operation which cannot be 
performed in a predetermined period can be deferred to 
a subsequent period in which the necessary resources 
are available. 

[0225] Moreover, it is not indispensable that all or part 
of the processors are provided with caches in that the 
advantages offered by the architecture forming the sub- 
ject of the present invention are achieved by the fact that 
transfer of data between processors can be carried out 
in superimposition with transfers between processors 
and memory. 

[0226] Finally it must be clear that in the preceding 
description the term "processor" can be taken to include 
also a group or "cluster" of processors. These can be 
interconnected together with a local bus and communi- 
cate with the system bus and with the point-to-point 
channels for the transfer of data through an interface 
adaptor so that the external effects of the group of proc- 
essors is seen as a single processor. 
[0227] Direct connection without interface adaptor, of 
groups of processors directly to the system bus is also 
possible, each processor of the group being directly 
connected to the same data transfer channel, which can 
be considered as a multipoint data bus relative to the 
connections with several processors and a point-to- 
point data bus in relation to the aggregate of processors 
and to its connection to the data cross bus 1 6. 
[0226] Naturally, in this case the "transfer rate" of data 
would be less due to the greater loads on the data chan- 
nel, and it would be necessary to adopt a lower clock 
signal frequency CK. 

[0229] Alternatively, in the case of a system in which 
a first plurality of individual processors communicates 
with the data cross bar through a plurality of point-to- 
point data channels and a second plurality of processors 
(for example processors functioning as peripheral con- 
trollers which have lower speed requirements) commu- 
nicate with the data cross bar through a single multipoint 
(data channel) bus, the data transfer on this bus could 
take place with the occupation of the bus for several 
clock periods (for example 2) maintaining unchanged 



the transfer frequency (a single clock period for each 
block transferred). 

[0230] This solution is evidently convenient only if the 
data cross bar is of the type shown in Figure 5 that is to 

s say provided with buffer registers. 

[0231] Figure 7, like Figure 1 , schematically illustrates 
this type of architecture. The different elements which 
are functionally equivalent to those of Figure 1 are iden- 
tified with the same reference numerals. 

10 [0232] The diagram of Figure 7 differs from that of Fig- 
ure 1 only by the fact that the processors 1 and 2 are 
constituted by a pair of processors. 
[0233] The processor 1 is constituted by two proces- 
sors 101, 102 directly connected to the ACBUS bus 17 

is and to the data channel I/O D1 . 

[0234] These two processes are seen from the mem- 
ory control and arbitration unit 15 as two separate proc- 
essors in competition with one another not only for ac- 
cess to the command and address bus, but also for ac- 

20 cess to the data channel I/OD1 . 

[0235] The unit 15 can take account of this fact both 
in the arbitration unit and in its finite state logic, and it is 
evident that the two processors 101 and 102 cannot 
conduct transactions on data channel I/O D1 in temporal 

25 superimposition. 

[0236] The processor 2 is constituted by two proces- 
sors 103, 104 and by an interface adaptor 105. 
[0237] The two processors 103, 104 communicate 
with one another and with interface logic through a local 

30 bus 106 of conventional type. 

[0238] The interface logic 105 is connected to the 
command and address bus 1 7 and to the data channel 
I/O D2 and arbitrates access to the local bus 1 06 by rec- 
ognising the access requests presented by the proces- 

35 sors 1 03, 1 04 for the system bus 1 7 and the data chan- 
nel I/OD2. 

[0239] These requests are transferred to the system 
bus in accordance with the protocol and timing of this 
bus. 

40 [0240] It is evident that, whilst the local bus 106 can 
be of asynchronous type and the operation of the proc- 
essors 103 and 104 can be of asynchronous type, the 
unit 1 05 must be timed by the CK signal so as to operate 
in synchronism with the other elements of the system. 

45 The processors 101, 102 which communicate directly 
with the system bus 1 7 and with the channel I/O D2 must 
be subject to the same condition. 
[0241] The two processors 103, 104 are seen by the 
unit 15 as a single processor and the unit 105 performs 

so the function of diverting the message data received to 
one or the other of the processors. 



Claims 

55 

1. A multi-processor system in which a plurality of 
groups of processors (1, 2, 3, 4) each group com- 
prising at least one processor, has access to a plu- 
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rality of shared memory modules (10, 11, 12, 13, 
113, 114), the operation of said group of processors 
and of said memory modules being timed by a com- 
mon synchronisation signal (CK), comprising: 

5 

a memory control unit (15) timed by the said 
synchronisation signal, 

a multi-point system bus (17) connected to said 
group of processors (1 , 2, 3, 4) and to said 
memory control unit (1 5) to transfer addresses 10 
and operation commands to said memory con- 
trol unit (15), other than data to be transferred 
between said groups of processors and said 
modules and between said groups of proces- 
sors, 15 
interconnection logic circuits (16), 
a plurality of point-to-point connection channels 
(I/OD1, I/OD2, I/OD3, I/OD4) one for each of 
said groups of processors, for transferring data 
between said groups of processors and said 20 
modules (10, 11 114) and between said 
groups of processors (1 , 2, 3, 4) other than ad- 
dresses for addressing said modules, each of 
said connection channels (l/ODi) individually 
connecting one of said groups of processors ( 1 , * 5 
2. 3, 4) to said interconnection logic circuits 
(16), 

a memory address channel (18) connected to 
said memory control unit (17) and to said mod- 
ules for addressing of said modules by said 30 
control unit (15), 

a memory data transfer channel (1 g) for input/ 
output transfers from said modules, for cou- 
pling said modules to said interconnection logic 
circuits (16), said interconnection logic circuits 35 
(16) being controlled by said memory control 
unit (15) for selectively connecting said point- 
to-point connection channels (l/ODi) to said 
memory data transfer channel (19) and be- 
tween themselves, and 40 
- control logic circuits (72, 73, 74, 75) provided 
in said memory control unit (15) for receiving 
via said system bus (1 7) an ordered succession 
of associated commands and addresses 
placed by said processor groups (1 , 2, 3, 4) on- 45 
to said system bus (17), and said control logic 
circuits (72-75) being operative to identify re- 
sources requested for the execution of said 
commands and their availability in time, to 
transfer said associated commands and ad- so 
dresses onto said address channel (18), to- 
gether with signals for selection of a module if 
said resources are available, and said control 
logic circuits being also operative to command 
and to time the selective interconnection of said 55 
point-to-point connection channels (l/ODi) be- 
tween themselves and with said memory data 
transfer channel (19) in said interconnection 



logic circuits (16). 

2. A system as in Claim 1 , in which said interconnec- 
tion logic circuits (16) comprise an input data hold- 
ing register (37, 49, 42) for each channel (1 9, l/ODi) 
coupled to said logic circuits. 

3. A system as in Claim 2, in which said interconnec- 
tion logic circuits (16) include a holding register (44, 
62) for data output from said interconnection circuits 
(16) for each of said channels (19, l/ODi) coupled 
to said logic circuits (16). 

4. A system as in Claim 3, in which: 

said memory data transfer channel (1 9) has a 
parallelism which is a multiple of the parallelism 
of said point-to-point connection channels (I/ 
ODi), and 

said interconnection logic circuits (16) com- 
prise, for each point-to-point channel (l/ODi) 
coupled to said interconnection logic circuits 
(16) a plurality of registers (49, 51 , 52, 54, 57) 
connected in cascade for accumulation of a plu- 
rality of data received in succession, and 
means (56) for simultaneous transfer of said 
plurality of data to an output data holding reg- 
ister (62) for data output to said memory data 
transfer channel (11). 

5. A system as in Claim 4, in which said interconnec- 
tion logic circuits (1 6) include a plurality of multiplex- 
ers (31) each coupled between a data input register 
(42) of said memory data transfer channel (1 9) and, 
a respective output data holding register (44) for 
holding data output to one of said point-to-point 
channels (l/ODi) for transfer in succession of por- 
tions of said data held in said input register (42) to 
said holding register (44) for holding data output to 
an associated point-to-point channel (l/ODi). 

6. A system as in Claim 5, in which said multiplexers 
(31) have a plurality of groups of inputs each cou- 
pled to the outputs of one of said holding registers 
for data input from one of said point-to-point chan- 
nels (l/ODi). 

7. A system as in Claim 1 , in which said control logic 
circuits (15) includes a system bus access arbitra- 
tion unit (70) for arbitrating access of said processor 
groups (1, 2, 3, 4) to said system bus (17). 

8. A system as in Claim 7 ; in which said control logic 
circuits (15) include means (SNOOP OUTi) (72) for 
receiving from said groups of processors (1 , 2, 3, 4) 
intervention request signals for modification of an 
item of data read in one of said modules on request 
of a first of said groups, and for controlling said in- 
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terconnection logic circuits (16) to transfer a modi- 
fied data unit provided from a second of said 
groups to said first group. 



Patentanspruche 

1. Mehrprozessorsystem, bei dem mehrere Prozes- 
sorgruppen (1, 2, 3, 4) mit jeweils mindestens ei- 
nem Prozessor auf mehrere gemeinsame Spei- 10 
chermodule(10 > 11, 12, 13, 113, 1 1 4) zugreifen kon- 
nen : wobei der Betrieb der Prozessorgruppen und 
der Speichermodule durch ein gemeinsames Syn- 
chronisiersignal (CK)zeitlich gesteuert wird ; umfas- 
send: 15 

eine durch das Synchronisiersignal zeitlich ge- 
sleuerte Speichersteuereinheit (15), 
einen Mehrpunkt-Systembus (17), der mit den 
Prozessorgruppen (1, 2, 3, 4) und der Spei- 20 
chersteuereinheit (15) verbunden ist, urn 
Adressen und Betriebsbefehle, die keinen zwi- 
schen den Prozessorgruppen und den Modu- 
len Oder zwischen den Prozessorgruppen 
selbst zu ubertragenden Daten entsprechen, 25 
zu der Speichersteuereinheit (15) zu ubertra- 
gen, 

Verbindungslogikschaltungen (16), 
mehrere Punkt-zu-Punkt-Verbindungskanale 
(I/OD1 , I/OD2, I/OD3, I/OD4), von denen je- ^ 
weils einer einer der Prozessorgruppen zuge- 
ordnet ist, urn Daten, die keinen Adressen zur 
Adressierung der Module entsprechen, zwi- 
schen den Prozessorgruppen und den Modu- 
len (10, 11 ... 114) und zwischen den Prozes- 35 
sorgruppen (1, 2, 3, 4) selbst zu ubertragen, 
wobei jeder Verbindungskanal (l/l Di) individuell 
eine der Prozessorgruppen (1, 2, 3, 4) mit den 
Verbindungslogikschaltungen (16) verbindet, 
einen Speicheradressenkanal (18), der zur 40 
Adressierung der Module mit Hilfe der Steuer- 
einheit (15) mit der Speichersteuereinheit (17) 
und den Modulen verbunden ist, 
einen Speicherdaten-Obertragungskanal (19) 
fOr EingangsVAusgangsubertragungen der 45 
Module, urn die Module mit den Verbindungs- 
logikschaltungen (16) zu verbinden, wobei die 
Verbindungslogikschaltungen (16) durch die 
Speichersteuereinheit (15) derart angesteuert 
werden, daB die Punkt-zu-Punkt-Verbindungs- $0 
kanale (l/ODi) selektiv mit dem Speicherdaten- 
Ubertragungskanal (1 9) und mit sich selbst ver- 
bunden werden, und 
- Steuerlogikschaltungen (72, 73, 74, 75), die in 
der Speichersteuereinheit (15) vorgesehen 55 
sind, urn uberden Systembus (17) eine geord- 
nete Folge von durch die Prozessorgruppen (1 , 
2 : 3 : 4) auf den Systembus (17) gelegten Be- 



fehlen und Adressen zu empfangen, wobei die 
Steuerlogikschaltungen (72-75) derart ausge- 
staltet sind, daB sie fur die Ausfuhrung der Be- 
fehle angeforderte Ressourcen sowie deren 
zeitliche Verfugbarkeit identifizieren und die 
entsprechenden Bef ehle und Adressen zusam- 
men mit Signalen zur Auswahl eines Moduls 
auf den Adressenkanal (18) ubertragen, falls 
die Ressourcen verfugbar sind, und wobei die 
Steuerlogikschaltungen zudem derart ausge- 
staltet sind. daB sie die selektive Verbindung 
der Punkt-zu-Punkt-Verbindungskanale (I/ 
ODi) zwischen ihnen selbst und mit dem Spei- 
cherdaten-Ubertragungskanal (19) in den Ver- 
bindungslogikschaltungen (16) befehlen und 
zeitlich steuern. 

2. System nach Anspruch 1 , 

wobei die Verbindungslogikschaltungen (16) fur je- 
deri Kanal (1 9, l/ODi) ein mit den Logikschaltungen 
verbundenes Eingangsdaten-Halteregister (37, 49, 
42) umfassen. 

3. System nach Anspruch 2, 

wobei die Verbindungslogikschaltungen (16) fOr je- 
den an die Verbindungslogikschaltungen (16) an- 
geschlossenen Kanal (19, l/ODi) ein Halteregister 
(44, 62) fur von den Verbindungslogikschaltungen 
(16) ausgegebene Daten aufweist. 

4. System nach Anspruch 3, 
wobei: 

der Speicherdaten-Ubertragungskanal (1 9) ei- 
ne Parailelitat aufweist, die einem Vielfachen 
der Parailelitat der Pun kt-zu -Pun kt- Verb in- 
dungskanale (l/ODi) entspricht, und 
- die Verbindungslogikschaltungen (16)furjeden 
an die Verbindungslogikschaltungen (16) an- 
geschlossenen Punkt-zu-Punkt-Kanal (l/ODi) 
mehrere kaskadenartig verschaltete Register 
(49, 51 , 52, 54, 57) umfassen, urn mehrere 
nacheinander empfangene Daten anzusam- 
meln, sowie Mittel (56) zur gleichzeitigen Ober- 
tragung der mehreren Daten zu einem Aus- 
gangsdaten-Halteregister (62), urn die Daten 
an den Speicherdaten-Ubertragungs kanal (11 ) 
auszugeben. 

5. System nach Anspruch 4, 

wobei die Verbindungslogikschaltungen (16) meh- 
rere Multiplexer (31) enthalten, wobei jeder Multi- 
plexer zwischen ein Dateneingaberegister (42) des 
Speicherdaten-Ubertragungskanals (19) und ein 
entsprechendes Ausgangsdaten-Halteregister (44) 
zum Halten von an einen der Punkt-zu-Punkt-Ka- 
nale (l/ODi) ausgegebenen Daten geschaltet ist, 
urn nacheinander Teile der in dem Eingangsregister 



14 



27 



EP 0 608 663 B1 



28 



(42) gehaltenen Daten zu dem zum Halten der an 
einen entsprechenden Punkt-zu-Punkt-Kanal (I/ 
ODi) auszugebenden Daten vorgesehenen Halte- 
register (44) zu ubertragen. 

6. System nach Anspruch 5, 

wobei die Multiplexer (31) mehrere Gruppen von 
Eingangen aufweisen, die jeweils mit einem der 
Ausgangen der zur Dateneingabe von einem der 
Punkt-zu-Punkt-Kanale (l/ODi) vorgesehenen Hal- 
teregister verbunden sind. 

7. System nach Anspruch 1 , 

wobei die Steuerlogikschaltungen (15) eine Sy- 
stembuszugriff-Arbitrierungseinheit (70) zum Arbi- 
trieren des Zugrrffs der Prozessorgruppen (1, 2, 3, 
4) auf den Systembus (17) aufweisen. 

8. System nach Anspruch 7, 

wobei die Steuerlogikschaltungen (15) Mittel 
(SNOOP OUTi) (72) aufweisen, urn von den Pro- 
zessorgruppen (1,2, 3, 4) Eingriffsanforderungssi- 
gnale zur Veranderung eines gelesenen Datenele- 
ments in einem der Module bei Anforderung durch 
eine erste Gruppe zu empfangen und die Verbin- 
dungslogikschaltung (16) derart anzusteuern, daB 
eine von einer zweiten Gruppe der ersten Gruppe 
zugefuhrte veranderte Dateneinheit ubertragen 
wird. 



Revendications 

1. Systeme multiprocesseur dans lequel plusieurs 
groupes de processeurs (1 , 2, 3, 4), chaque groupe 
comprenant au moins un processeur, ont acces a 
plusieurs modules de memoire partagee (10, 11, 
12, 13, 113, 114), le fonctionnement du groupe de 
processeurs et des modules de m6moire 6tant ca- 
dence par un signal de synchronisation commun 
(CK), comprenant : 

un organe de commande de memoire (15) ca- 
dence par le signal de synchronisation, 
un bus de systeme multipoint (1 7) reli6 au grou- 
pe de processeurs (1, 2, 3, 4) et a I'organe de 
commande de m6moire (1 5) pour le transfert a 
ce dernier d'adresses et de commandes d'ope- 
rations autres que des donn6es a transferer en- 
tre les groupes de processeurs et les modules 
et entre les groupes de processeurs, 
des circuits logiques d'interconnexion (1 6), 
plusieurs voies de liaison point a point (I/OD1 , 
I/OD2, I/OD3, I/OD4), une pour chacun des 
groupes de processeurs, pour le transient entre 
les groupes de processeurs et les modules (10, 
11, .... 114)et entre les groupesde processeurs 
(1 , 2, 3, 4) de donn§es autres que des adresses 



pour I'adressage des modules, chacune de ces 
voies'de liaison (l/ODi) reliant individuellement 
un des groupes de processeurs (1 , 2, 3, 4) aux 
circuits logiques d'interconnexion (16), 

5 une voie d'adressage de memoire (1 8) reli6e a 

I'organe de commande de mSmoire (1 5) et aux 
modules pour I'adressage des modules par cet 
organe de commande (15), 
une voie de transfert de donnees en m6moire 

w (1 9) pour des transf erts d'entr6e-sortie des mo- 

dules, pour la liaison des modules aux circuits 
logiques ^'interconnexion (16), les circuits logi- 
ques d'interconnexion (16) 6tant commandes 
par I'organe de commande de memoire (15) 

is pour relier selectivement les voies de liaison 

point a point (l/ODi) a la voie de transfert de 
donnees en m6moire (19) et entre elles, et 
des circuits logiques de commande (72, 73, 74, 
75) prevus dans I'organe de commande de m6- 

20 moire (15) pour recevoir par le bus de systeme 

(17) une suite ordonn6e de commandes et 
d'adresses associees mise sur le bus de sys- 
teme (17) par les groupes de processeurs (1, 
2, 3, 4), ces circuits logiques de commande (72 

25 a 75) ayant comme fonction d'identifier les res- 

sources demandees pour I'exScution des com- 
mandes et la disponibilite de ces ressources 
dans le temps, pour transferer les commandes 
et adresses associees sur la voie d'adressage 

so (1 8), conjointement avec des signaux de selec- 

tion d'un module si les ressources sont dispo- 
nibles, et ces circuits logiques de commande 
ayant aussi comme fonction de commander et 
cadencer I' interconnexion selective des voies 

35 de liaison point a point (l/ODi) entre elles et 

avec la voie de transfert de donnees en memoi- 
re (19) dans les circuits logiques d'intercon- 
nexion (16). 

40 2. Systeme selon la revendication 1, dans lequel les 
circuits logiques d'interconnexion (16) compren- 
nent un registre de conservation de donnees d'en- 
tr<§e (37, 49, 42) pour chaque voie (19, l/ODi) reli£e 
a ces circuits logiques. 

45 

3. Systeme selon la revendication 2, dans lequel les 
circuits logiques d'interconnexion (16) contiennent 
un registre de conservation (44, 62) pour les don- 
n6es 6mises de ceux-ci pour chacune des voies 

50 (1 9, l/ODi) relives a ceux-ci. 

4. Systeme selon la revendication 3, dans lequel : 

la voie de transfert de donn6es en m6moire 
55 (19) a un paral!6lisme multiple du paral!6lisme 

des voies de liaison point a point (l/ODi), et 
les circuits logiques d'interconnexion (16) com- 
prennent, pour chaque voie point a point (l/ODi) 
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reliee & eux, plusieurs registres (49, 51 , 52, 54, . 
57) months en cascade pour I'accumulation de 
donnees recues successivement, et un moyen 
(56) de Iransfert simultane de ces donn6es a 
un registre de conservation de donnees de sor- & 
tie (62) pour des donnees emises & destination 
de la voie de transfert de donn§es en mSmoire 
(19). 

5. Systeme selon la revendication 4, dans lequel les io 
circuits logiques d' interconnexion (16) contiennent 
plusieurs multiplexeurs (31) months chacun entre 

un registre d'entree de donnees (42) de la voie de 
transfert de donnees en memoire (1 9) et un registre 
respectif de conservation de donn6es de sortie (44) *s 
pour la conservation de donnees 6mises & destina- 
tion d'une des voies point & point (l/ODi) pour le 
transfert successif de parties des donnees conser- 
ves dans le registre d'entree (42) au registre de 
conservation (44) pour la conservation de donnees 20 
ernises k destination d'une voie point k point (l/ODi) 
associ£e. 

6. Systeme selon la revendication 5 ; dans lequel les 
multiplexeurs (31 ) ont plusieurs groupes d'entrees 25 
relies chacun aux sorties d'un des registres de con- 
servation pour des donnees entrees en provenance 
d'une des voies point a point (l/ODi). 

7. Systeme selon la revendication 1 , dans lequel les 30 
circuits logiques de commande (15) contiennent un 
organe d'arbitrage (70) pour I'arbitrage de I'acces 
des groupes de processeurs (1, 2, 3, 4) au bus de 
systeme (17). 

35 

8. Systeme selon la revendication 7, dans lequel les 
circuits logiques de commande (15) contiennent un 
moyen (SNOOP OUTi) (72) pour la reception des 
groupes de processeurs (1, 2, 3, 4) de signaux de 
demande d'intervention pour la modification d'un 40 
element d'information lu dans un des modules sur 
demande d'un premier de ces groupes et pour la 
commande des circuits logiques ^interconnexion 
(16) pour le transfert d'un ensemble de donnees 
modifie\ prevu d'uri deuxieme des groupes au pre- *s 
mier. 
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